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ABSTRACT 

We present tables of critical values for a new 
multivariate goodness-of-f it test introduced by 
Foutz. Some details of our improved asymptotic 
approximation and evaluation of its accuracy 
are given. 



1. Introduction 



Foutz [1] proposed a new goodness-of-f it test for fitting 

univariate as well as multivariate distributions. He showed 

that the null distribution of the test statistic. F , does not 

n 

depend on (1) the hypothesized distribution, or (2) the number 

of components in the random vector under study. An integral 

representation for the null CDF of F n was provided. Closed 

form expressions for this null distribution are quite 

difficult to obtain, even for small sample sizes. The 

alternative has been to approximate the distribution by a 

-1 -1 -2 

normal distribution with mean e and variance (2e -5e )/n; 

this, however, does not appear to provide a good approximation 
to the percentiles of the null distribution of F n for moderate 
sample sizes. 

The authors [2] compared the F -test with the Chi-squared 

test and the Kolmogorov- Smirnov test and found that the F -test 

n 

does have higher power when fitting certain types of distribu- 
tions. Another investigation by the authors and Linhart [3] 
examined the power of the F n ~test when fitting a multivariate 
normal distribution; the test did well in detecting mean shifts 

and variance shifts. We therefore believe that the F -test 

n 

is a definite alternative to the Chi-squared and Kolmogorov- 
Smirnov tests when fitting univariate distributions and it is 
just about the only available test for fitting multivariate 
distributions. However, the test is not very convenient for 
applications due to the difficulty in obtaining accurate 
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critical values. This paper fills the gap by providing tables 
of approximate pecentiles of the null distribution of F . 

2. Description of the F n ~Test 

The procedure for calculating the test statistic F^ is the 
following. Given a random sample X^, ... , X n _^, from 

a continuous multivariate distribution, the sample space is 
partitioned into n statistically equivalent blocks. Let 
h-, (X) , b.^ (X) , ... ,h n _^(X) be any n-1 "cutting functions" 

such that h, (X) has a continuous distribution, k = 1, 

2, ... , n-1, and let k^, k^, ... , k^_^ be a permutation of 

1, 2, ... , n-1. Let X(k^) be the sample vector corresponding 
to the k. th order statistic of h, (X.), i = 1, 2, ... , n-1. 

. K ^ X 

The initial partition of the sample space into two blocks is 
defined by 

B 1 = ^6) < h ’ K (X(k 1 ))}, and 




The cutting function h, (X) is then used to partition B 1 
(if k 2 < k x ) or B 2 (if k 2 > k 2 > into two subblooks in a similar 
fashion. When all the cutting functions are exhausted the 
sample space will have been partitioned into n statistically 
equivalent blocks, 3^, S 2 / ••• # £> n - A convenient choice for 

the cutting functions in the univariate case is the identity 
function. In the multivariate case letting h^ (X) = X ^ , the 
jth component of X (for various j), appears to work well. 

More details on partitioning the sample space into statistically 
equivalent blocks and some examples can be found in [3]. 



2 



Once the statistically equivalent blocks are determined, a 
computational formula for the test statistic F n for the 
hypothesis that the samples are from a specified distribution 
H is 

n 1 

F - Z max [ 0 , — - D . ] , 

n n x 

i=l 



where = P[X e S^JH] 

The integral representation for the null CDF of F^ results 
in the following closed form expressions for n = 3, 4, and 5. 



P[F 3 < x] = " 



6x 



1-3( T ~ x) 



0 < x s 3 

1 ,2 

TT < X < ^ 



2 

x > 3 



P[F 4 < x] = 



20x' 



3 2 9 1 

-20x^ + 18x - jX + — 

4 16 

1 - 4(| - X) 3 



1 



0 < x < - 

4 

1 < x ~k 

i — i 



x " 3 



P[F 5 < x] 



7 Ox 



45x 4 - 80x 3 + i|ix 2 



1 - 5(| - x ) 4 



1 



0 < x < 3 



, 16 x _ 


1 


!<**§ 


25 


125 


176 x f 


31 


2 3 


~ 25 X + 


125 


5 " X - 5 



3 „ 4 
5 < X " 5 



x > 



3 



It does not appear to be possible to generate a closed 
form expression for the CDF of F n in the general case. Foutz ' s 
large sample normal approximation is given by 



( 1 ) 



P [F £ x] 
n 




- e' 1 ) 



5e 2 )n) 




where i> is the standard normal CDF. To check the accuracy of 
this approximation in our earlier study [2] , we generated 
samples of size n-1 = 20, 30, and 50 from a uniform distribution 
on [0,1] and tested the hypothesis that the the samples are in 
fact from that distribution. The empirical significance levels 
in 80,000 replications are given in Table 1. 



Nominal 

Significance 

Level 



n-1 20 



30 50 



0.10 

0.05 

0.01 



0.0757 
0 .0372 
0.0082 



0.0800 

0.0399 

0.0083 



0.0859 

0.0428 

0.0093 



Table 1 

Empirical Significance Level 
(Based on 80/000 replications) 

It can be seen that the observed significance levels are 
consistently smaller than the nominal values by about 10-20%. 
We therefore proposed the use of Monte Carlo critical values, 
which were based on 25,000 replications. These values are 
given in Table 2 and the corresponding observed significance 
levels, base- on 225,000 subsequent repetitions, are given in 
Table 3. 
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Significance 

Level 



20 



n-1 



50 



30 



0.10 

0.05 

0.01 



0.42714 

0.44865 

0.48659 



0.41903 

0.43553 

0.46579 



0.40816 

0.42116 

0.44487 



Table 2 

Monte Carlo Critical Values 
(Based on 25,000 replications) 



Nominal 

Significance 

Level 



n-1 

20 30 50 



0.10 

0.05 

0.01 



0.1006 

0.0486 

0.0103 



0.9700 

0.0486 

0.0101 



0.1003 

0.0498 

0.0102 



Table 3 

Empirical Significance Level 
(Based on 225,000 replications) 



The above findings lead us into a search for an improved 
approximation for determining the percentiles of the null 
distribution of F n » We found that allowing the mean and variance 
to be functions of the sample size leads to greatly improved 
approximations. While it is difficult to give precise error 
bounds on the percentile values, our computational experience 
indicates about a four decimal place accuracy. This leads to 
rejection rates with errors in the fourth decimal place, usually. 
Comparing the error in the rejection rates for the asymptotic 
approximation (1) given by Foutz, our approximation is better 
by a factor of 10 or more. 
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3. Modified Normal Aooroximation 



The data for the approximation of the null distribution of 
the Foutz statistic was obtained by Monte Carlo methods. For 
a given sample size n-1, sequences of n-1 uniformly distributed 
numbers where generated using the IMSL* random number generator 
GGUBS. The Foutz statistic was then computed and tabulated 
into one of 200 equilength intervals. This process was 
replicated 25,000 times. The entire set consists of the 
empirical cumulative distribution functions obtained from this 
data for 60 sample sizes, n-1 = 2(1)40, 40(2)70, and 70(5)100. 
Potentially this yields as many as 12000 pieces of data, 
however if only intervals with nontrivial data in them are 
counted, this is reduced to about 4700. 

A data fitting problem with 4700 points is not easily handled 
unless a linear model is accepted. We do not know the behavior 
of the distribution as the sample size gets large, so we were 
reluctant to impose a form with only linear parameters, 
especially in sample size. We decided on attempting a correction 
to the asymptotic approximation given by Foutz. 

After some experimentation with various types of corrections, 
it was decided the most reasonable was to include correction 
terms in the argument of the asymptotic approximation. In 
order to make the computation feasible it was decided to fit the data 
in a two pass scheme; first the null distribution for each sample 

* International Mathematics and Statistical Libraries, 7500 
Bellaire Drive, Houston, TX 77036 
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size was approximated as below, and then the parameters in these 
approximations were fit by functions of sample size. 

The precise form of the approximation was through the 
argument of a normal distribution, which was taken to be of 
the form 

(a + b n(x-e x ) + c (x-e 1 ) 2 )/ n/( 2 - 5e~ 2 )n. 

Because we are strongly interested in the inverse CDF, the data 
was weighted at each point by the centered difference from the 
Monte Carlo data, which then resulted in a greater weight on 
the part of the curve with a large slope. The results of this 
least squares process yielded a table of values of a, b, and 
c versus sample size (actually we consider them as functions 
of n = sample size + 1) . We observe that the amount of 
scatter increases as n increases. There tends to be even more 
scatter with higher powers of (x-e 2 ) . For this reason it was 
decided to weight the smaller sample sizes more heavily, and a 
weight of 1/n was adopted. Since the data is more dense for 
smaller sample sizes this results in considerably less weight 
for the large sample sizes, although we feel the trend is still 
properly modelled and that our approximation is considerably 
better than the asymptotic approximation for very large sample 
sizes, say even up to 1000. 

In the second stage of the process the coefficients a, b, 
and c was chosen to allow a rate of decay (or growth) of the 
coefficients to be dictated by the data. Thus we fit a, b and 
c with functions of the form A + Bn . 



For the terms which are constant and linear in (x-e *") the 
exponent was negative, however, for C (n) the exponent was 
positive, indicating that the term grows (somewhat slower than 
linearly) with sample size. We do not consider this as 
bothersome, however, since the linear term in (x-e - '*') 
has already (due to the form of the asymptotic approximation) 
been included with a factor that grows linearly with sample 
size . 

The overall result of this nonlinear least squares 
approximation is the approximate CDF involving the nine 
parameters , 



( 2 ) 



P[F n < x] $ [ (g (x) / 'A 2 e 1 - 5 e -2 )n)l , 



-1 -1 2 

where g(x) = a(n) + b (n) n (x-e ) + c (n) (x-e ) , and 



-1 4416 

a (n) = 0.2039 + 0.1876 n ' 

b(n) = 1.0015 - 0.05672 n -0 * 7377 . 



c (n) = 0.3049 - 0.5912 n 



0.3927 



In order to test our results, two different approaches 
were taken. First, the number of rejections for previously run 
tests were available for sample sizes of n-1 = 20, 30, and 50, 
at (approximately) the 0.10, 0.05, and 0.01 levels. 3y 
computing the derivative of the approximate CDF, equation (2), 
and making a correction along the tangent line, we were able to 
estimate the anticipated rejection rate that would occur with 
our present approximation. This data was accumulated over 
225,000 replications, and is given in Table 4. The main entry 
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is the anticipated rejection rate when using the results of our 
approximation, above. As a point of comparision with Foutz's 
asymptotic approximation, we include the corresponding rates 
for it in parenthesis. Second, to test the approximation for 
a smaller, as well as an intermediate sample size, we computed 
the Foutz statistic for 300,000 uniformly distributed samples 
of sizes 10 and 40, and tabulated them at intervals of .0001 in 
the range of interest. The results of these calculations are 
shown in Table 5 for the 0.10, 0.05, and 0.01 levels. 



Nominal 

Significance 

Level 




n-1 




20 


30 


50 


0.10 


0.0994 

(0.0764) 


0 . 1002 
(0.0801) 


0.1007 

(0.0840) 


0.05 


0.0496 

(0.0385) 


0.0500 

(0.0402) 


0.0505 

(0.0420) 


0.01 


0 . 0098 
(0.0085) 


0.0095 

(0.0086) 


0.0098 

(0.0088) 


Table 4 

Anticipated Rejection Rates 
From Approximate Critical Values 
(Based on 225,000 replications) 






Nominal 




n-1 




Significance 

Level 


10 


40_ 




0.10 


0.0989 

(0.0687) 


0.0998 

(0.0824) 




0.05 


0 . 0481 
(0.0349) 


0 . 0491 
(0.0087) 




0.01 


0.0086 

(0.0069) 


0.0098 

(0.0087) 




Table 5 

Empirical Significance Levels 
(Base on 300,000 replications) 
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As is shown by the tables, we expect the error in the 
rejection rates due to use of our approximate percentiles to 
be smaller by a factor of 10-20 for the 0.20 to 0.05 level 
than they are for Foutz ' s normal approximation. At the 
extreme tails, our approximation is not as good as at the more 
moderate levels, but is still a worthwhile improvement over the 
asymptotic approximation. 

Table 6 lists some upper percentiles of the approximate 
CDF given by Equation (2) for sample sizes 5(1)100, 100(10)200, 
and 200(100)1000. The exact values are given for n-1 =2, 3, 
and 4. Since we expect the entries to have about 4 digit 
accuracy, linear interpolation for intermediate sample sizes 
will have comparable accuracy. Linear interpolation in the 
percentiles is not accurate, and other percentiles should be 
calculated from equation (.2) . it is interesting to observe the 
"surface" of the null CDF in a perspective plot, as in Figure 
1. Of course, only discrete slices exist; the cross section 
lines in the direction of sample size are an artifact of the 
plotting package. The convergence toward a sharp rise of the 
CDF in the vicinity of x - e ^ as sample size increases is very 
apparent. 
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Figure 1: Null CDF of 
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TABLE 6 

Approximate percentage pom is tor the null 
distribution of trie rout z statistic 
(Note: sample size is n- 1 ) 
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0. 40434 


0.41584 


0.42591 


0.43770 


63 


0.39039 


0. 39644 


0. 40408 


0.41 549 


0.42546 


0.43716 


64 


0. 39024 


0. 39624 


0. 40382 


0.41 513 


0.42503 


0.43654 


65 


0.39019 


0.39604 


0. 403 5 6 


0.41479 


0.42461 


0.43612 


66 


0. 38995 


0. 39585 


0 . 40 3 3 2 


0.41 446 


0.42420 


0.43562 


67 


0 .38981 


0.39567 


0 . 40307 


0.41413 


0. 42379 


0.43512 


68 


0 .38967 


0. 39548 


0. 40283 


0.41 381 


0.42340 


0.43454 


69 


0. 38954 


0.39531 


0. 402 6 0 


0.41 349 


0.42301 


0.43417 


70 


0. 38940 


0. 39513 


0. 40237 


0.41318 


0.42263 


0.43370 


71 


0. 38927 


0.39496 


0. 4021 5 


0.41 288 


0. 42226 


0.43325 


72 


0. 38914 


0. 39479 


0 . 40 1 9 3 


0.41 258 


0.42190 


0.43281 


73 


0.38902 


0. 39462 


0 . 40 1 7 1 


0.41 229 


0.42154 


0.43237 


74 


0. 38889 


0. 39446 


0. 401 5 0 


0.41 201 


0.421 19 


0.43195 


75 


0.38877 


0. 39430 


0. 401 30 


0.41 173 


0.42085 


0.43153 


76 


0. 38865 


0. 39415 


0 . 40 1 0 9 


0.41146 


0.42051 


0.43112 


77 


0 .38853 


0. 39399 


0. 4008 9 


0.41 119 


0.42018 


0.43072 


78 


0. 38842 


0. 39384 


0. 40070 


0.41 092 


0.41335 


0. 43033 


79 


0. 38830 


0.39369 


0. 40050 


0.41 067 


0.41354 


0. 42994 


80 


0. 38819 


0. 39355 


0. 40032 


0 . 4 1 04 1 


0.41323 


0.42955 


81 


0.38808 


0.39340 


0.40013 


0.41016 


0.41392 


0.42919 


82 


3. 3 87 97 


0. 39326 


0. 39995 


0.40 992 


0.41362 


0.42882 


83 


0. 38787 


0. 39312 


0. 39977 


0 .40 968 


0.41333 


0. 42846 


84 


0. 38776 


0. 39299 


0. 39959 


0 .40 944 


0.41304 


0.4281 1 


85 


0. 38766 


0.39285 


0. 39942 


0.40921 


0.41775 


0.42776 


86 


0. 3 8 7 56 


0. 39272 


0. 39925 


0.40898 


0.41747 


0.42742 


87 


0 .38745 


0.39259 


0. 39908 


0.40875 


0.41720 


0.42709 


88 


3. 38736 


0. 39247 


0. 39891 


0.40 853 


0.41593 


0.42675 


89 


0. 38726 


0.39234 


0. 39875 


0.40 831 


0.41566 


0.42643 


90 


3. 38717 


0. 39222 


0. 398 5 9 


0.40810 


0.41540 


0.42611 


91 


0 .38708 


0. 39209 


0. 39844 


0.40789 


0.41514 


0.42580 


92 


3 . 3 8698 


0. 39198 


0. 39828 


0.40768 


0.41539 


0.42549 


93 


0-38689 


0.39186 


0. 398 1 3 


0.40748 


0.41564 


0.42519 


94 


3. 38683 


0. 39174 


0. 39798 


0.40727 


0.41539 


0 .42439 


95 


0 .3 8672 


0.39163 


0. 39783 


0.40708 


0.415 15 


0.42459 


96 


0. 38663 


0.39151 


0. 3976 8 


0 .40 688 


0.41491 


0.42431 


97 


0 .38654 


0. 39140 


0. 39754 


0.40669 


0.41467 


0.42402 


98 


3 . 38646 


0. 39129 


0. 39740 


0.40650 


0.41444 


0. 42374 


99 


0. 38637 


0.3911 8 


0. 3^726 


0 .40 63 1 


0.41422 


0.42346 


100 


3 . 386 29 


0. 39108 


0. 3971 2 


0.40613 


0.41399 


0.42319 


1 10 


0 . 38553 


0.39009 


0. 39585 


0.40443 


0.41191 


0.42067 


1 20 


3 . 38485 


0. 38922 


0. 39473 


0 .40293 


0.41009 


0.41845 


1 30 


0. 3 8425 


0. 38844 


0. 39373 


0.40161 


0.40348 


0.41651 


140 


3 .3 8371 


0.38775 


0. 39285 


0.4 0 04 3 


0.40705 


0.41473 


150 


0. 38322 


0. 38712 


0. 39204 


0.39937 


0.40575 


0.41321 


160 


3 - 3 8278 


0. 38655 


0 . 39 1 3 1 


0 .39840 


0.40458 


0.41 180 


1 70 


0. 38237 


0. 38603 


0. 39065 


0.39752 


0.40351 


0.41051 


1 80 


0. 38199 


0.38555 


0. 39004 


0.39671 


0.40253 


0.40932 


1 90 


0. 38165 


0.3851 1 


0. 38947 


0.39 597 


0.40153 


0.40823 


200 


0. 38132 


0. 38470 


0 . 38895 


0.39528 


0.40079 


0.40723 


300 


0. 37901 


0. 38176 


0. 38522 


0.39 038 


0.39486 


0.40009 


400 


3. 37760 


0. 37997 


0. 38297 


0.38743 


0.391 30 


0.39582 


500 


0.37662 


0. 3 7 874 


0 . 38 1 4 2 


0.38 540 


0.38386 


0.39289 


600 


0 . 37589 


0. 37783 


0. 38027 


0.38390 


0. 38706 


0.39073 


700 


0.37532 


0. 3771 1 


0. 37937 


0 .38 273 


0. 38565 


0.38905 


800 


0.37485 


0. 37653 


0. 37865 


0.38 179 


0. 384 5 1 


0.38769 


900 


0. 37447 


0.37605 


0. 37804 


0.33 100 


0.38357 


0.38657 


100C 


3 . 37414 


0. 37564 


0. 37753 


0.38034 


0.38277 


0.38561 



TABLE 6 (Continued) 

Approximate nercentaga points for the null 
distribution of tn? Fours statistic 
(Note: sample size is n - 1 ) 



0,995 



0. 44642 
0, 44580 
0-44519 
0- 44460 
0. 44402 
0-44345 
0-44289 
0.44235 
0-44182 
0-44130 
0- 44079 
C - 440 29 
0. 43980 
0-43932 
0- 43886 
0.43840 
0-43794 
0.43750 
0.43707 
0. 43664 
0-43622 
0-43581 
0. 43541 
0.43501 
0.43462 
0-43424 
0-43386 
0.43349 
0-43313 
0-43277 
0. 43242 
0.43207 
0.43173 
0.43139 
0. 43106 
0.43074 
0.43042 
0. 430 10 
0-42979 
0-42949 
0.42666 
0. 42419 
0. 42201 
0. 42006 
0.41831 
0.41673 
0.41529 
0. 41397 
0.41275 
0.41162 
0.40366 
0. 39890 
0- 39564 
0.39324 
0. 39137 
0. 38986 
0-38861 
0. 38755 
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